The required libraries are loaded - RomicsProcessor written by Geremy Clair (2021) is used to perform trackable transformation and statistics to the dataset
library("RomicsProcessor")
library("DT") #for the rendering of the enrichment tables
library("proteinminion") #this package was created by Geremy Clair (2021) to download UniProt protein details
Using the package ‘Protein Mini-on’ (Geremy Clair 2021, in prep.), The fasta file was downloaded from Unipro for the human and bovine proteome on the Jun 15th, 2021
if(!file.exists("./03 - Output files/Uniprot_Bos_taurus_proteome_UP000009136_2020_06_15.fasta")){
download_UniProtFasta(proteomeID = "UP000009136",reviewed = F,export = TRUE, file="./03 - Output files/Uniprot_Bos_taurus_proteome_UP000009136_2020_06_15.fasta")
}
if(!file.exists("./03 - Output files/Uniprot_Homo_sapiens_proteome_UP000005640_2020_06_15.fasta")){
download_UniProtFasta(proteomeID = "UP000005640",reviewed = F,export = TRUE, file="./03 - Output files/Uniprot_Homo_sapiens_proteome_UP000005640_2020_06_15.fasta")
}
The iBAQ data contained in the protein table was loaded, the corresponding metadata was loaded
data<-data.frame(extractMaxQuant("./01 - Source files/proteinGroups.txt",quantification_type = "iBAQ",cont.rm = T,site.rm = T,rev.rm = T))
## [1] "141 Proteins were removed (protein(s) only identified by site,contaminant(s),reverse hit(s))"
## [1] "iBAQ quantification was used"
IDsdetails<-extractMaxQuantIDs("./01 - Source files/proteinGroups.txt",cont.rm = T,site.rm = T,rev.rm = T)
## [1] "141 Proteins were removed (protein(s) only identified by site,contaminant(s),reverse hit(s))"
IDsdetails<-cbind(UniProt_Name=sub(".*\\|","",IDsdetails$protein.ids), IDsdetails)
colnames(data)<- sub("iBAQ.","",colnames(data))
metadata<- read.csv(file = "./01 - Source files/metadata.csv")
colnames(metadata)<-tolower(colnames(metadata))
write.csv(extractMaxQuantIDs("./01 - Source files/proteinGroups.txt",cont.rm = T,site.rm = T,rev.rm = T),"./03 - Output files/MaxQuantIDS.csv")
## [1] "141 Proteins were removed (protein(s) only identified by site,contaminant(s),reverse hit(s))"
The data and metadata were placed in an romics_object, the sample names were retrieved from the metadata, the condition will be use for the coloring of the Figure and statistics
romics_proteins<- romicsCreateObject(data,metadata,main_factor = "Condition")
romics_proteins<- romicsSampleNameFromFactor(romics_proteins,factor = "sample_names")
The missingness was evaluated for each channel/sample
romics_proteins<- romicsZeroToMissing(romics_proteins)
romicsPlotMissing(romics_proteins)
The proteins to be conserved for quantification were selected to contain at least 70% of complete values (3/4 samples) for a given condition, the overall missingness was evaluated after filtering.
romics_proteins<-romicsFilterMissing(romics_proteins,percentage_completeness = 75)
## [1] "28 rows were removed for the data"
## [1] "Based on the minimum completeness set at 75%"
## [1] "at least the following number of sample(s) containing data was required:"
## EC_mono EVT_EC_CO EVT_EC_DSC_TRI EVT_mono
## 3 3 3 3
print(paste0(nrow(romics_proteins$data),"/", nrow(romics_proteins$original_data)," proteins remained after filtering", " (",round(nrow(romics_proteins$data)/nrow(romics_proteins$original_data)*100,2),"%)."))
## [1] "484/512 proteins remained after filtering (94.53%)."
romicsPlotMissing(romics_proteins)
The data was log2 transformed, the distriution boxplot were then plotted
romics_proteins<-log2transform(romics_proteins)
distribBoxplot(romics_proteins)
As the same quantity of protein was labelled for each sample, the expectation is that the distribution of the protein abundance is centered, therefore a median centering was performed prior to plot again the distribution boxplots.
romics_proteins<-medianCenterSample(romics_proteins)
distribBoxplot(romics_proteins)
The grouping of the samples by is checked by hierarchical clustering
romicsHclust(romics_proteins)
For some of the subsequent statistics imputations are required, we performed an imputation by assuming that the “non-detected” proteins were either low abundance or missing using the method developped by Tyranova et al. (PMID: 27348712). The gray distribution is the data distribution, the yellow distribution is the one for the random values used for imputation.
imputeMissingEval(romics_proteins,nb_stdev = 2,width_stdev = 0.5, bin=1)
romics_proteins<-imputeMissing(romics_proteins,nb_stdev = 2,width_stdev = 0.5)
The PCA grouping was checked again after imputation
indPCAplot(romics_proteins, plotType = "percentage")
indPCAplot(romics_proteins, plotType = "individual",Xcomp=1,Ycomp =2)
indPCAplot(romics_proteins, plotType = "individual",Xcomp=2,Ycomp =3)
indPCA3D(romics_proteins)
The means and stdev are calculated for each group
romics_proteins<-romicsMean(romics_proteins)
## [1] "The Statistics layer was added to your object"
## [1] "Means columns (*_mean) were added to the statistics"
romics_proteins<-romicsSd(romics_proteins)
## [1] "The standard deviation columns (*_sd) were added to the statistics"
Some general statistics are performed (ANOVA, T.tests).
romics_proteins<-romicsANOVA(romics_proteins)
## [1] "The ANOVA columns (ANOVA_p and ANOVA_padj) were added to the statistics"
romics_proteins<-romicsTtest(romics_proteins,var.equal = T)
## [1] "T_test columns were added to the statistics"
print(paste0(sum(romics_proteins$statistics$ANOVA_p<0.05), " proteins had an ANOVA p<0.05."))
## [1] "189 proteins had an ANOVA p<0.05."
A heatmap depicting the proteins passing an ANOVA p<0.05 is plotted, the clusters obtained were saved in the statistics.
romicsHeatmap(romics_proteins,variable_hclust_number = 4,ANOVA_filter = "p", p=0.05,sample_hclust = F)
romics_proteins<-romicsVariableHclust(romics_proteins,clusters = 4,ANOVA_filter = "p",p= 0.05,plot = F)
## [1] "The columns hclust_clusters was added to the statistics"
romics_proteins<-romicsZscores(romics_proteins)
## [1] "Z_score_ columns were added to the statistics"
The data was exported
results<-romicsExportData(romics_proteins,statistics = T,missing_data = T)
write.csv(results, "./03 - Output files/implantation_proteomics_complete_results.csv")